Process mining on noisy logs - Can log sanitization help to improve performance?
نویسندگان
چکیده
Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.
منابع مشابه
Experience Mining Google's Production Console Logs
We describe our early experience in applying our console log mining techniques [19, 20] to logs from production Google systems with thousands of nodes. This data set is five orders of magnitude in size and contains almost 20 times as many messages types as the Hadoop data set we used in [19]. It also has many properties that are unique to large scale production deployments (e.g., the system sta...
متن کاملA Study of Quality and Accuracy Trade-offs in Process Mining
The goal of process mining is to extract semantic knowledge from a log consisting of process execution traces for the purposes of process understanding, innovation and improvement. In recent years many algorithms have been proposed to extract process models from logs. The process models describe the ordering relationships between tasks in a process in terms of standard constructs like sequence,...
متن کاملA method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction
Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...
متن کاملOn Global Completeness of Event Logs
The field of process mining provides a collection of techniques and tools that aim to support the extraction of information out of event logs. This information may provide businesses insight into actual execution and performance of their business processes and may help identify ways of improving these processes. While the quality of the results of the application of mining algorithms depends on...
متن کاملEfficient Frequent Pattern Mining on Web Log Data
Mining frequent patterns from web log data can help to optimise the structure of a web site and improve the performance of web servers. Web users can also benefit from these frequent patterns. Many efforts have been done to mine frequent patterns efficiently. Candidate-generation-and-test approach (Apriori and its variants) and pattern-growth approach (FP-growth and its variants) are the two re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Decision Support Systems
دوره 79 شماره
صفحات -
تاریخ انتشار 2015